Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 20707 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 2.1 MiB |
| Average record size in memory | 104.0 B |
Variable types
| Categorical | 2 |
|---|---|
| Numeric | 11 |
id has a high cardinality: 20707 distinct values | High cardinality |
num_nodes is highly correlated with num_tweets and 2 other fields | High correlation |
num_tweets is highly correlated with num_nodes and 2 other fields | High correlation |
avg_num_retweet is highly correlated with retweet_perc and 2 other fields | High correlation |
retweet_perc is highly correlated with avg_num_retweet and 1 other fields | High correlation |
num_users is highly correlated with num_nodes and 2 other fields | High correlation |
avg_num_followers is highly correlated with avg_num_retweet and 2 other fields | High correlation |
avg_num_friends is highly correlated with avg_num_followers | High correlation |
avg_time_diff is highly correlated with avg_num_retweet and 2 other fields | High correlation |
users_10h is highly correlated with num_nodes and 2 other fields | High correlation |
num_nodes is highly correlated with num_tweets and 2 other fields | High correlation |
num_tweets is highly correlated with num_nodes and 2 other fields | High correlation |
avg_num_retweet is highly correlated with retweet_perc | High correlation |
retweet_perc is highly correlated with avg_num_retweet | High correlation |
num_users is highly correlated with num_nodes and 2 other fields | High correlation |
users_10h is highly correlated with num_nodes and 2 other fields | High correlation |
num_nodes is highly correlated with num_tweets and 2 other fields | High correlation |
num_tweets is highly correlated with num_nodes and 2 other fields | High correlation |
avg_num_retweet is highly correlated with retweet_perc and 1 other fields | High correlation |
retweet_perc is highly correlated with avg_num_retweet | High correlation |
num_users is highly correlated with num_nodes and 2 other fields | High correlation |
avg_time_diff is highly correlated with avg_num_retweet | High correlation |
users_10h is highly correlated with num_nodes and 2 other fields | High correlation |
label is highly correlated with retweet_perc | High correlation |
num_nodes is highly correlated with num_tweets and 2 other fields | High correlation |
num_tweets is highly correlated with num_nodes and 2 other fields | High correlation |
avg_num_retweet is highly correlated with retweet_perc | High correlation |
retweet_perc is highly correlated with label and 2 other fields | High correlation |
num_users is highly correlated with num_nodes and 2 other fields | High correlation |
avg_num_followers is highly correlated with avg_num_friends | High correlation |
avg_num_friends is highly correlated with avg_num_followers | High correlation |
perc_post_1_hour is highly correlated with retweet_perc | High correlation |
users_10h is highly correlated with num_nodes and 2 other fields | High correlation |
avg_num_retweet is highly skewed (γ1 = 24.90205017) | Skewed |
avg_num_followers is highly skewed (γ1 = 43.04629615) | Skewed |
avg_time_diff is highly skewed (γ1 = 21.51128756) | Skewed |
id is uniformly distributed | Uniform |
id has unique values | Unique |
avg_num_retweet has 11573 (55.9%) zeros | Zeros |
avg_time_diff has 11586 (56.0%) zeros | Zeros |
Reproduction
| Analysis started | 2021-11-08 10:06:07.069668 |
|---|---|
| Analysis finished | 2021-11-08 10:06:51.907060 |
| Duration | 44.84 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 161.9 KiB |
| real | |
|---|---|
| fake |
Length
| Max length | 4 |
|---|---|
| Median length | 4 |
| Mean length | 4 |
| Min length | 4 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | fake |
|---|---|
| 2nd row | fake |
| 3rd row | fake |
| 4th row | fake |
| 5th row | fake |
Common Values
| Value | Count | Frequency (%) |
| real | 15648 | |
| fake | 5059 | 24.4% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| real | 15648 | |
| fake | 5059 | 24.4% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
| Distinct | 908 |
|---|---|
| Distinct (%) | 4.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 83.51567103 |
| Minimum | 2 |
|---|---|
| Maximum | 4494 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 15 |
| median | 40 |
| Q3 | 65 |
| 95-th percentile | 212.7 |
| Maximum | 4494 |
| Range | 4492 |
| Interquartile range (IQR) | 50 |
Descriptive statistics
| Standard deviation | 228.0398023 |
|---|---|
| Coefficient of variation (CV) | 2.730503144 |
| Kurtosis | 62.58983601 |
| Mean | 83.51567103 |
| Median Absolute Deviation (MAD) | 25 |
| Skewness | 7.06566585 |
| Sum | 1729359 |
| Variance | 52002.15144 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2 | 1216 | 5.9% |
| 3 | 469 | 2.3% |
| 7 | 343 | 1.7% |
| 5 | 341 | 1.6% |
| 4 | 341 | 1.6% |
| 6 | 338 | 1.6% |
| 10 | 326 | 1.6% |
| 15 | 316 | 1.5% |
| 9 | 312 | 1.5% |
| 12 | 300 | 1.4% |
| Other values (898) | 16405 |
| Value | Count | Frequency (%) |
| 2 | 1216 | |
| 3 | 469 | 2.3% |
| 4 | 341 | 1.6% |
| 5 | 341 | 1.6% |
| 6 | 338 | 1.6% |
| 7 | 343 | 1.7% |
| 8 | 299 | 1.4% |
| 9 | 312 | 1.5% |
| 10 | 326 | 1.6% |
| 11 | 288 | 1.4% |
| Value | Count | Frequency (%) |
| 4494 | 1 | |
| 3583 | 1 | |
| 3505 | 1 | |
| 3455 | 1 | |
| 3220 | 1 | |
| 3200 | 1 | |
| 3193 | 1 | |
| 3070 | 1 | |
| 2941 | 1 | |
| 2939 | 1 |
| Distinct | 667 |
|---|---|
| Distinct (%) | 3.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 58.47336649 |
| Minimum | 1 |
|---|---|
| Maximum | 1730 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 12 |
| median | 36 |
| Q3 | 58 |
| 95-th percentile | 141 |
| Maximum | 1730 |
| Range | 1729 |
| Interquartile range (IQR) | 46 |
Descriptive statistics
| Standard deviation | 121.1044079 |
|---|---|
| Coefficient of variation (CV) | 2.071103737 |
| Kurtosis | 36.98730366 |
| Mean | 58.47336649 |
| Median Absolute Deviation (MAD) | 23 |
| Skewness | 5.663174855 |
| Sum | 1210808 |
| Variance | 14666.2776 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1312 | 6.3% |
| 2 | 507 | 2.4% |
| 3 | 435 | 2.1% |
| 4 | 386 | 1.9% |
| 5 | 378 | 1.8% |
| 6 | 347 | 1.7% |
| 9 | 330 | 1.6% |
| 14 | 324 | 1.6% |
| 8 | 318 | 1.5% |
| 11 | 306 | 1.5% |
| Other values (657) | 16064 |
| Value | Count | Frequency (%) |
| 1 | 1312 | |
| 2 | 507 | 2.4% |
| 3 | 435 | 2.1% |
| 4 | 386 | 1.9% |
| 5 | 378 | 1.8% |
| 6 | 347 | 1.7% |
| 7 | 297 | 1.4% |
| 8 | 318 | 1.5% |
| 9 | 330 | 1.6% |
| 10 | 289 | 1.4% |
| Value | Count | Frequency (%) |
| 1730 | 1 | |
| 1632 | 1 | |
| 1585 | 1 | |
| 1551 | 1 | |
| 1516 | 1 | |
| 1414 | 1 | |
| 1401 | 2 | |
| 1342 | 1 | |
| 1319 | 2 | |
| 1278 | 1 |
avg_num_retweet
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWEDZEROS| Distinct | 2969 |
|---|---|
| Distinct (%) | 14.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.2229082209 |
| Minimum | 0 |
|---|---|
| Maximum | 51 |
| Zeros | 11573 |
| Zeros (%) | 55.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.148474567 |
| 95-th percentile | 1.038843385 |
| Maximum | 51 |
| Range | 51 |
| Interquartile range (IQR) | 0.148474567 |
Descriptive statistics
| Standard deviation | 0.9019846535 |
|---|---|
| Coefficient of variation (CV) | 4.046439606 |
| Kurtosis | 1071.340429 |
| Mean | 0.2229082209 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 24.90205017 |
| Sum | 4615.760529 |
| Variance | 0.8135763151 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 11573 | |
| 0.3333333333 | 171 | 0.8% |
| 1 | 163 | 0.8% |
| 0.2 | 159 | 0.8% |
| 0.5 | 151 | 0.7% |
| 0.25 | 137 | 0.7% |
| 0.1666666667 | 123 | 0.6% |
| 0.1111111111 | 95 | 0.5% |
| 0.1428571429 | 88 | 0.4% |
| 0.1 | 86 | 0.4% |
| Other values (2959) | 7961 |
| Value | Count | Frequency (%) |
| 0 | 11573 | |
| 0.004807692308 | 1 | < 0.1% |
| 0.005434782609 | 1 | < 0.1% |
| 0.005780346821 | 1 | < 0.1% |
| 0.005787037037 | 1 | < 0.1% |
| 0.006269592476 | 1 | < 0.1% |
| 0.006329113924 | 1 | < 0.1% |
| 0.006535947712 | 1 | < 0.1% |
| 0.007142857143 | 1 | < 0.1% |
| 0.007407407407 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 51 | 1 | |
| 47.5 | 1 | |
| 34 | 1 | |
| 27 | 1 | |
| 25 | 1 | |
| 20.5 | 1 | |
| 20 | 1 | |
| 18 | 1 | |
| 17.4 | 1 | |
| 16.5 | 1 |
| Distinct | 3082 |
|---|---|
| Distinct (%) | 14.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.173209762 |
| Minimum | 0.003125 |
|---|---|
| Maximum | 0.980952381 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 0.003125 |
|---|---|
| 5-th percentile | 0.01538461538 |
| Q1 | 0.02777777778 |
| median | 0.07692307692 |
| Q3 | 0.2727272727 |
| 95-th percentile | 0.5394736842 |
| Maximum | 0.980952381 |
| Range | 0.977827381 |
| Interquartile range (IQR) | 0.2449494949 |
Descriptive statistics
| Standard deviation | 0.191102693 |
|---|---|
| Coefficient of variation (CV) | 1.103302093 |
| Kurtosis | 0.8439326292 |
| Mean | 0.173209762 |
| Median Absolute Deviation (MAD) | 0.05805515239 |
| Skewness | 1.314962197 |
| Sum | 3586.654542 |
| Variance | 0.03652023929 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.5 | 1395 | 6.7% |
| 0.3333333333 | 574 | 2.8% |
| 0.25 | 400 | 1.9% |
| 0.2 | 339 | 1.6% |
| 0.1666666667 | 305 | 1.5% |
| 0.1428571429 | 284 | 1.4% |
| 0.1111111111 | 271 | 1.3% |
| 0.06666666667 | 268 | 1.3% |
| 0.1 | 267 | 1.3% |
| 0.05555555556 | 253 | 1.2% |
| Other values (3072) | 16351 |
| Value | Count | Frequency (%) |
| 0.003125 | 1 | |
| 0.00462962963 | 1 | |
| 0.005524861878 | 1 | |
| 0.005847953216 | 1 | |
| 0.005882352941 | 1 | |
| 0.006097560976 | 1 | |
| 0.006289308176 | 1 | |
| 0.00641025641 | 1 | |
| 0.006756756757 | 1 | |
| 0.006802721088 | 2 |
| Value | Count | Frequency (%) |
| 0.980952381 | 1 | |
| 0.9795918367 | 1 | |
| 0.9718309859 | 1 | |
| 0.9649122807 | 1 | |
| 0.962962963 | 1 | |
| 0.9545454545 | 2 | |
| 0.95 | 1 | |
| 0.9462365591 | 1 | |
| 0.9444444444 | 1 | |
| 0.9436619718 | 1 |
| Distinct | 840 |
|---|---|
| Distinct (%) | 4.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 69.04882407 |
| Minimum | 1 |
|---|---|
| Maximum | 3071 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 11 |
| median | 35 |
| Q3 | 54 |
| 95-th percentile | 180.7 |
| Maximum | 3071 |
| Range | 3070 |
| Interquartile range (IQR) | 43 |
Descriptive statistics
| Standard deviation | 183.0438335 |
|---|---|
| Coefficient of variation (CV) | 2.650933394 |
| Kurtosis | 57.27780374 |
| Mean | 69.04882407 |
| Median Absolute Deviation (MAD) | 22 |
| Skewness | 6.827869215 |
| Sum | 1429794 |
| Variance | 33505.04499 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1290 | 6.2% |
| 2 | 559 | 2.7% |
| 5 | 444 | 2.1% |
| 3 | 417 | 2.0% |
| 4 | 402 | 1.9% |
| 9 | 393 | 1.9% |
| 8 | 388 | 1.9% |
| 7 | 360 | 1.7% |
| 6 | 358 | 1.7% |
| 11 | 357 | 1.7% |
| Other values (830) | 15739 |
| Value | Count | Frequency (%) |
| 1 | 1290 | |
| 2 | 559 | |
| 3 | 417 | 2.0% |
| 4 | 402 | 1.9% |
| 5 | 444 | 2.1% |
| 6 | 358 | 1.7% |
| 7 | 360 | 1.7% |
| 8 | 388 | 1.9% |
| 9 | 393 | 1.9% |
| 10 | 333 | 1.6% |
| Value | Count | Frequency (%) |
| 3071 | 1 | |
| 2880 | 1 | |
| 2877 | 1 | |
| 2655 | 1 | |
| 2448 | 1 | |
| 2395 | 1 | |
| 2364 | 1 | |
| 2323 | 1 | |
| 2313 | 1 | |
| 2290 | 1 |
total_propagation_time
Real number (ℝ≥0)
| Distinct | 20415 |
|---|---|
| Distinct (%) | 98.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1513952418 |
| Minimum | 1210434286 |
|---|---|
| Maximum | 1545330351 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 1210434286 |
|---|---|
| 5-th percentile | 1493292819 |
| Q1 | 1504002795 |
| median | 1515167137 |
| Q3 | 1524636878 |
| 95-th percentile | 1533539483 |
| Maximum | 1545330351 |
| Range | 334896065 |
| Interquartile range (IQR) | 20634083.5 |
Descriptive statistics
| Standard deviation | 14860078.73 |
|---|---|
| Coefficient of variation (CV) | 0.009815419926 |
| Kurtosis | 50.16884292 |
| Mean | 1513952418 |
| Median Absolute Deviation (MAD) | 10181844 |
| Skewness | -3.339417445 |
| Sum | 3.134941272 × 1013 |
| Variance | 2.208219399 × 1014 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1535993091 | 12 | 0.1% |
| 1535595698 | 11 | 0.1% |
| 1522096409 | 6 | < 0.1% |
| 1536364695 | 5 | < 0.1% |
| 1535367862 | 5 | < 0.1% |
| 1530614029 | 4 | < 0.1% |
| 1536157157 | 4 | < 0.1% |
| 1536305868 | 4 | < 0.1% |
| 1523039834 | 4 | < 0.1% |
| 1520062874 | 4 | < 0.1% |
| Other values (20405) | 20648 |
| Value | Count | Frequency (%) |
| 1210434286 | 1 | |
| 1218609042 | 1 | |
| 1223914841 | 1 | |
| 1246219422 | 1 | |
| 1267216672 | 1 | |
| 1273757458 | 1 | |
| 1289331628 | 1 | |
| 1310067855 | 1 | |
| 1311902283 | 1 | |
| 1312339876 | 1 |
| Value | Count | Frequency (%) |
| 1545330351 | 1 | < 0.1% |
| 1545270471 | 1 | < 0.1% |
| 1544968941 | 1 | < 0.1% |
| 1544960557 | 1 | < 0.1% |
| 1544958577 | 3 | |
| 1544940820 | 1 | < 0.1% |
| 1544879266 | 1 | < 0.1% |
| 1544873950 | 1 | < 0.1% |
| 1544849471 | 1 | < 0.1% |
| 1544818623 | 1 | < 0.1% |
| Distinct | 19405 |
|---|---|
| Distinct (%) | 93.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43606.33895 |
| Minimum | 0 |
|---|---|
| Maximum | 11710354 |
| Zeros | 7 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 513.1 |
| Q1 | 1912.9375 |
| median | 4189.958333 |
| Q3 | 40927.0625 |
| 95-th percentile | 208577.1552 |
| Maximum | 11710354 |
| Range | 11710354 |
| Interquartile range (IQR) | 39014.125 |
Descriptive statistics
| Standard deviation | 131890.3209 |
|---|---|
| Coefficient of variation (CV) | 3.024567621 |
| Kurtosis | 3329.7346 |
| Mean | 43606.33895 |
| Median Absolute Deviation (MAD) | 3077.958333 |
| Skewness | 43.04629615 |
| Sum | 902956460.7 |
| Variance | 1.739505674 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3937 | 342 | 1.7% |
| 1329 | 177 | 0.9% |
| 1112 | 139 | 0.7% |
| 317729 | 77 | 0.4% |
| 48 | 77 | 0.4% |
| 9 | 27 | 0.1% |
| 2524.5 | 17 | 0.1% |
| 1992.5 | 17 | 0.1% |
| 68.95454545 | 17 | 0.1% |
| 3 | 15 | 0.1% |
| Other values (19395) | 19802 |
| Value | Count | Frequency (%) |
| 0 | 7 | |
| 1 | 12 | |
| 1.055555556 | 1 | < 0.1% |
| 2 | 9 | |
| 3 | 15 | |
| 4 | 2 | < 0.1% |
| 4.733333333 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 6 | 4 | < 0.1% |
| 7 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 11710354 | 1 | |
| 5860708 | 1 | |
| 5855249 | 1 | |
| 2295190.6 | 1 | |
| 1694395 | 1 | |
| 1459946 | 1 | |
| 1306683 | 1 | |
| 1172399.2 | 1 | |
| 1066707.273 | 1 | |
| 953163.625 | 1 |
| Distinct | 19131 |
|---|---|
| Distinct (%) | 92.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2332.05777 |
| Minimum | 0 |
|---|---|
| Maximum | 111645 |
| Zeros | 109 |
| Zeros (%) | 0.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 406.9269231 |
| Q1 | 1142.198684 |
| median | 1784.2 |
| Q3 | 2854.25 |
| 95-th percentile | 5712.586218 |
| Maximum | 111645 |
| Range | 111645 |
| Interquartile range (IQR) | 1712.051316 |
Descriptive statistics
| Standard deviation | 2583.053995 |
|---|---|
| Coefficient of variation (CV) | 1.107628648 |
| Kurtosis | 383.8004211 |
| Mean | 2332.05777 |
| Median Absolute Deviation (MAD) | 774.0545455 |
| Skewness | 13.38117773 |
| Sum | 48289920.25 |
| Variance | 6672167.943 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1128 | 342 | 1.7% |
| 1496 | 177 | 0.9% |
| 69 | 141 | 0.7% |
| 0 | 109 | 0.5% |
| 723 | 80 | 0.4% |
| 564 | 22 | 0.1% |
| 104.1818182 | 17 | 0.1% |
| 598.5 | 17 | 0.1% |
| 221 | 16 | 0.1% |
| 27 | 16 | 0.1% |
| Other values (19121) | 19770 |
| Value | Count | Frequency (%) |
| 0 | 109 | |
| 1 | 8 | < 0.1% |
| 2 | 5 | < 0.1% |
| 3 | 1 | < 0.1% |
| 5 | 3 | < 0.1% |
| 5.4 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 6.333333333 | 1 | < 0.1% |
| 7 | 1 | < 0.1% |
| 8.5 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 111645 | 1 | |
| 93133 | 1 | |
| 84111.16667 | 1 | |
| 75835.5 | 1 | |
| 71258.57143 | 1 | |
| 56129.5 | 1 | |
| 55836 | 1 | |
| 55126.88889 | 1 | |
| 46863 | 1 | |
| 46573 | 1 |
| Distinct | 8108 |
|---|---|
| Distinct (%) | 39.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 63317.38665 |
| Minimum | -3823143.75 |
|---|---|
| Maximum | 25733526 |
| Zeros | 11586 |
| Zeros (%) | 56.0% |
| Negative | 1 |
| Negative (%) | < 0.1% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | -3823143.75 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 7690.015625 |
| 95-th percentile | 116886.7906 |
| Maximum | 25733526 |
| Range | 29556669.75 |
| Interquartile range (IQR) | 7690.015625 |
Descriptive statistics
| Standard deviation | 627812.3568 |
|---|---|
| Coefficient of variation (CV) | 9.915323263 |
| Kurtosis | 575.8213057 |
| Mean | 63317.38665 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 21.51128756 |
| Sum | 1311113125 |
| Variance | 3.941483553 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 11586 | |
| 29 | 12 | 0.1% |
| 1 | 12 | 0.1% |
| 57 | 11 | 0.1% |
| 63 | 11 | 0.1% |
| 23 | 10 | < 0.1% |
| 41 | 9 | < 0.1% |
| 30 | 9 | < 0.1% |
| 61 | 9 | < 0.1% |
| 39 | 8 | < 0.1% |
| Other values (8098) | 9030 |
| Value | Count | Frequency (%) |
| -3823143.75 | 1 | < 0.1% |
| 0 | 11586 | |
| 1 | 12 | 0.1% |
| 5 | 3 | < 0.1% |
| 7 | 5 | < 0.1% |
| 8 | 2 | < 0.1% |
| 9 | 2 | < 0.1% |
| 10 | 1 | < 0.1% |
| 10.5 | 1 | < 0.1% |
| 11 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 25733526 | 1 | |
| 21256803.93 | 1 | |
| 19911993 | 1 | |
| 19486783 | 1 | |
| 18633030 | 1 | |
| 18520679 | 1 | |
| 18090151 | 1 | |
| 16355452 | 1 | |
| 16355432 | 1 | |
| 14541805 | 1 |
| Distinct | 3711 |
|---|---|
| Distinct (%) | 17.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.4048516835 |
| Minimum | 0.0002790957298 |
|---|---|
| Maximum | 1.063829787 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 0.0002790957298 |
|---|---|
| 5-th percentile | 0.05 |
| Q1 | 0.2 |
| median | 0.347826087 |
| Q3 | 0.5769230769 |
| 95-th percentile | 0.9130434783 |
| Maximum | 1.063829787 |
| Range | 1.063550692 |
| Interquartile range (IQR) | 0.3769230769 |
Descriptive statistics
| Standard deviation | 0.2559022829 |
|---|---|
| Coefficient of variation (CV) | 0.632088968 |
| Kurtosis | -0.588257315 |
| Mean | 0.4048516835 |
| Median Absolute Deviation (MAD) | 0.172826087 |
| Skewness | 0.5388585871 |
| Sum | 8383.263811 |
| Variance | 0.06548597837 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0.5 | 1715 | 8.3% |
| 0.3333333333 | 610 | 2.9% |
| 0.25 | 442 | 2.1% |
| 0.6666666667 | 392 | 1.9% |
| 0.2 | 363 | 1.8% |
| 0.1666666667 | 251 | 1.2% |
| 0.4 | 235 | 1.1% |
| 0.1428571429 | 194 | 0.9% |
| 0.6 | 166 | 0.8% |
| 0.2857142857 | 165 | 0.8% |
| Other values (3701) | 16174 |
| Value | Count | Frequency (%) |
| 0.0002790957298 | 1 | |
| 0.0003131850924 | 1 | |
| 0.0003641660597 | 1 | |
| 0.0003799392097 | 1 | |
| 0.0005564830273 | 1 | |
| 0.0006027727547 | 1 | |
| 0.000637755102 | 1 | |
| 0.0006447453256 | 1 | |
| 0.0006626905235 | 1 | |
| 0.0007933359778 | 1 |
| Value | Count | Frequency (%) |
| 1.063829787 | 1 | |
| 0.9994192799 | 1 | |
| 0.9992181392 | 1 | |
| 0.9991928975 | 1 | |
| 0.9991173875 | 1 | |
| 0.9991007194 | 2 | |
| 0.9990627929 | 1 | |
| 0.9990205681 | 1 | |
| 0.9990069513 | 1 | |
| 0.999 | 1 |
| Distinct | 486 |
|---|---|
| Distinct (%) | 2.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 38.91659825 |
| Minimum | 1 |
|---|---|
| Maximum | 1204 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 161.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 7 |
| median | 27 |
| Q3 | 46 |
| 95-th percentile | 92 |
| Maximum | 1204 |
| Range | 1203 |
| Interquartile range (IQR) | 39 |
Descriptive statistics
| Standard deviation | 71.75327149 |
|---|---|
| Coefficient of variation (CV) | 1.843770389 |
| Kurtosis | 59.75803917 |
| Mean | 38.91659825 |
| Median Absolute Deviation (MAD) | 19 |
| Skewness | 6.800165365 |
| Sum | 805846 |
| Variance | 5148.53197 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 2180 | 10.5% |
| 2 | 823 | 4.0% |
| 3 | 621 | 3.0% |
| 4 | 486 | 2.3% |
| 5 | 433 | 2.1% |
| 7 | 410 | 2.0% |
| 39 | 406 | 2.0% |
| 41 | 401 | 1.9% |
| 8 | 400 | 1.9% |
| 9 | 394 | 1.9% |
| Other values (476) | 14153 |
| Value | Count | Frequency (%) |
| 1 | 2180 | |
| 2 | 823 | 4.0% |
| 3 | 621 | 3.0% |
| 4 | 486 | 2.3% |
| 5 | 433 | 2.1% |
| 6 | 379 | 1.8% |
| 7 | 410 | 2.0% |
| 8 | 400 | 1.9% |
| 9 | 394 | 1.9% |
| 10 | 367 | 1.8% |
| Value | Count | Frequency (%) |
| 1204 | 1 | |
| 1173 | 1 | |
| 1158 | 1 | |
| 1108 | 1 | |
| 1056 | 1 | |
| 1002 | 1 | |
| 1000 | 1 | |
| 876 | 1 | |
| 860 | 1 | |
| 847 | 1 |
| Distinct | 20707 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 161.9 KiB |
| gossipcop-1000240645 | 1 |
|---|---|
| gossipcop-903257 | 1 |
| gossipcop-903139 | 1 |
| gossipcop-903138 | 1 |
| gossipcop-903134 | 1 |
| Other values (20702) |
Length
| Max length | 20 |
|---|---|
| Median length | 16 |
| Mean length | 16.95040325 |
| Min length | 16 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 20707 ? |
|---|---|
| Unique (%) | 100.0% |
Sample
| 1st row | gossipcop-1000240645 |
|---|---|
| 2nd row | gossipcop-1000908841 |
| 3rd row | gossipcop-1009248558 |
| 4th row | gossipcop-1012123555 |
| 5th row | gossipcop-1014383679 |
Common Values
| Value | Count | Frequency (%) |
| gossipcop-1000240645 | 1 | < 0.1% |
| gossipcop-903257 | 1 | < 0.1% |
| gossipcop-903139 | 1 | < 0.1% |
| gossipcop-903138 | 1 | < 0.1% |
| gossipcop-903134 | 1 | < 0.1% |
| gossipcop-903123 | 1 | < 0.1% |
| gossipcop-903118 | 1 | < 0.1% |
| gossipcop-903117 | 1 | < 0.1% |
| gossipcop-903112 | 1 | < 0.1% |
| gossipcop-903107 | 1 | < 0.1% |
| Other values (20697) | 20697 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| gossipcop-1000240645 | 1 | < 0.1% |
| gossipcop-1156925967 | 1 | < 0.1% |
| gossipcop-1012123555 | 1 | < 0.1% |
| gossipcop-1014383679 | 1 | < 0.1% |
| gossipcop-1014616559 | 1 | < 0.1% |
| gossipcop-1014636162 | 1 | < 0.1% |
| gossipcop-1020220396 | 1 | < 0.1% |
| gossipcop-1020335052 | 1 | < 0.1% |
| gossipcop-1042406339 | 1 | < 0.1% |
| gossipcop-1023576750 | 1 | < 0.1% |
| Other values (20697) | 20697 |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| label | num_nodes | num_tweets | avg_num_retweet | retweet_perc | num_users | total_propagation_time | avg_num_followers | avg_num_friends | avg_time_diff | perc_post_1_hour | users_10h | id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | fake | 116 | 110 | 0.045455 | 0.051724 | 61 | 1.525941e+09 | 20970.565217 | 1149.026087 | 7.437060e+05 | 0.991379 | 56 | gossipcop-1000240645 |
| 1 | fake | 5 | 3 | 0.333333 | 0.400000 | 3 | 1.485491e+09 | 158959.750000 | 791.750000 | 6.278000e+03 | 0.200000 | 2 | gossipcop-1000908841 |
| 2 | fake | 3 | 2 | 0.000000 | 0.333333 | 1 | 1.495247e+09 | 317729.000000 | 723.000000 | 0.000000e+00 | 0.333333 | 1 | gossipcop-1009248558 |
| 3 | fake | 15 | 10 | 0.400000 | 0.333333 | 14 | 1.496761e+09 | 26939.000000 | 3446.928571 | 2.765667e+03 | 0.466667 | 7 | gossipcop-1012123555 |
| 4 | fake | 30 | 22 | 0.318182 | 0.266667 | 21 | 1.530403e+09 | 30835.965517 | 5045.862069 | 1.241908e+04 | 0.166667 | 11 | gossipcop-1014383679 |
| 5 | fake | 834 | 619 | 0.345719 | 0.257794 | 729 | 1.536210e+09 | 46004.381753 | 1616.280912 | 3.230924e+04 | 0.009592 | 42 | gossipcop-1014616559 |
| 6 | fake | 207 | 191 | 0.078534 | 0.077295 | 179 | 1.510127e+09 | 16406.587379 | 1466.067961 | 8.888193e+04 | 0.396135 | 78 | gossipcop-1014636162 |
| 7 | fake | 1002 | 857 | 0.168028 | 0.144711 | 655 | 1.530470e+09 | 26516.038961 | 1985.411588 | 3.416425e+06 | 0.965070 | 502 | gossipcop-1020220396 |
| 8 | fake | 1611 | 855 | 0.883041 | 0.469274 | 1495 | 1.534299e+09 | 9555.522360 | 1876.915528 | 8.945355e+04 | 0.973929 | 720 | gossipcop-1020335052 |
| 9 | fake | 2 | 1 | 0.000000 | 0.500000 | 1 | 1.535368e+09 | 42.000000 | 14.000000 | 0.000000e+00 | 0.500000 | 1 | gossipcop-1023576750 |
Last rows
| label | num_nodes | num_tweets | avg_num_retweet | retweet_perc | num_users | total_propagation_time | avg_num_followers | avg_num_friends | avg_time_diff | perc_post_1_hour | users_10h | id | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 20697 | real | 34 | 33 | 0.000000 | 0.029412 | 24 | 1.533639e+09 | 1182.666667 | 1157.272727 | 0.0 | 0.382353 | 22 | gossipcop-955991 |
| 20698 | real | 47 | 46 | 0.000000 | 0.021277 | 37 | 1.533197e+09 | 500.391304 | 715.934783 | 0.0 | 0.106383 | 34 | gossipcop-955997 |
| 20699 | real | 15 | 13 | 0.076923 | 0.133333 | 13 | 1.532969e+09 | 858.142857 | 1308.142857 | 94.0 | 0.533333 | 12 | gossipcop-956021 |
| 20700 | real | 22 | 21 | 0.000000 | 0.045455 | 20 | 1.533001e+09 | 939.666667 | 1255.904762 | 0.0 | 0.454545 | 19 | gossipcop-956038 |
| 20701 | real | 44 | 43 | 0.000000 | 0.022727 | 42 | 1.534531e+09 | 293.720930 | 452.325581 | 0.0 | 0.272727 | 39 | gossipcop-956070 |
| 20702 | real | 46 | 45 | 0.000000 | 0.021739 | 43 | 1.535384e+09 | 5930.600000 | 2673.133333 | 0.0 | 0.217391 | 36 | gossipcop-956072 |
| 20703 | real | 38 | 37 | 0.000000 | 0.026316 | 36 | 1.533013e+09 | 334.324324 | 408.270270 | 0.0 | 0.210526 | 34 | gossipcop-956091 |
| 20704 | real | 56 | 54 | 0.018519 | 0.035714 | 53 | 1.533027e+09 | 2063.418182 | 1523.854545 | 40383.0 | 0.160714 | 41 | gossipcop-956093 |
| 20705 | real | 45 | 44 | 0.000000 | 0.022222 | 43 | 1.533372e+09 | 2444.113636 | 1686.863636 | 0.0 | 0.177778 | 37 | gossipcop-956103 |
| 20706 | real | 47 | 46 | 0.000000 | 0.021277 | 44 | 1.533232e+09 | 566.021739 | 635.369565 | 0.0 | 0.255319 | 40 | gossipcop-956128 |